A Robust Method for Detecting DB-Outliers from High Dimensional Datasets

نویسندگان

Yuan LI

Hiroyuki KITAGAWA

چکیده

Outlier detection is a popular technique that can be utilized in many modern applications like financial analysis and fraud detection. As data description becomes complex, operated datasets’ dimensionalities keep monotone increasing. However, current researches find that it is extremely difficult to pick out outliers directly from high dimensional datasets owing to the curse of dimensionality. Moreover, general methods need some decisive parameters to be decided in advance. Such parameters usually have close connection with data distribution. Therefore, users always have no idea about proper parameters without identifying datasets beforehand. To address these problems, we introduce a method to discover exceptional objects that match users’ intentions in high dimensional datasets. Compared with determining proper parameters, users are more easily to provide some outlier examples including their intensions. We make good use of outlier examples, examine behaviors of projections of these examples and find an optimal subspace. The concept of DB(Distance-Based) Outliers is employed to detect outliers in the optimal subspace. Our proposed method is robust to tolerate noises or inconsistencies interfusing in outlier examples. Experiments operate on both synthetic and real datasets. The results show that our proposed approach is effective and efficient at detecting outliers corresponding with users ’intensions in high dimensional datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Example-Based DB-Outlier Detection from High Dimensional Datasets

Outlier detection is an important problem that has applications in many fields. High dimensional datasets are common in such applications. Among the existing outlier detection methods, Distance-Based outlier (DB-Outlier) detection is one of the most generalizable and simplest approaches. It finds outliers by calculating distances between data points. However, in high dimensional space, data dis...

متن کامل

Robust Subspace Outlier Detection in High Dimensional Space

Rare data in a large-scale database are called outliers that reveal significant information in the real world. The subspace-based outlier detection is regarded as a feasible approach in very high dimensional space. However, the outliers found in subspaces are only part of the true outliers in high dimensional space, indeed. The outliers hidden in normalclustered points are sometimes neglected i...

متن کامل

Finding Key Knowledge Attribute Subspace of Outliers for High Dimensional Dataset

Detecting outliers is an important task in many applications. Since most applications possess high dimensional data, traditional outlier detecting methods will become inefficient in such cases. To solve the problem, we propose the concept of outlying reduction by extending attribute reduction in rough set theory. Additionally, we define the key knowledge attribute subspace (KKAS), which can pro...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

A Robust Method for Detecting DB-Outliers from High Dimensional Datasets

نویسندگان

چکیده

منابع مشابه

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Example-Based DB-Outlier Detection from High Dimensional Datasets

Robust Subspace Outlier Detection in High Dimensional Space

Finding Key Knowledge Attribute Subspace of Outliers for High Dimensional Dataset

عنوان ژورنال:

اشتراک گذاری